Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC
نویسندگان
چکیده
In this paper, we continue our investigation of the warped discrete cosine transform cepstrum (WDCTC), which was earlier introduced as a new speech processing feature [1]. Here, we study the statistical properties of the WDCTC and compare them with the mel-frequency cepstral coefficients (MFCC). We report some interesting properties of the WDCTC when compared to the MFCC: its statistical distribution is more Gaussian-like with lower variance, it obtains better vowel cluster separability, it forms tighter vowel clusters and generates better codebooks. Further, we employ the WDCTC and MFCC features in a 5-vowel recognition task using Vector Quantization (VQ) and 1-Nearest Neighbour (1-NN) as classifiers. In our experiments, the WDCTC consistently outperforms the MFCC.
منابع مشابه
Implementing frequency-warping and VTLN through linear transformation of conventional MFCC
In this paper, we show that frequency-warping (including VTLN) can be implemented through linear transformation of conventional MFCC. Unlike the Pitz-Ney [1] continuous domain approach, we directly determine the relation between frequency-warping and the linear-transformation in the discrete-domain. The advantage of such an approach is that it can be applied to any frequency-warping and is not ...
متن کاملSpeaker Identification using MFCC-Domain Support Vector Machine
Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into textindependent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum c...
متن کاملAmplitude modulation features for emotion recognition from speech
The goal of speech emotion recognition (SER) is to identify the emotional or physical state of a human being from his or her voice. One of the most important things in a SER task is to extract and select relevant speech features with which most emotions could be recognized. In this paper, we present a smoothed nonlinear energy operator (SNEO)-based amplitude modulation cepstral coefficients (AM...
متن کاملMel-scaled Wavelet Filter Base Unvoiced Phoneme Re
In this paper we propose a filter bank structure derived by using admissible wavelet packet transform. These filters have Mel scale spacing and have an advantage of easy implementation with higher resolution in time-frequency domain because of wavelet transform. The features are obtained by first calculating the energy in each filter band and then applying the Discrete Cosine Transform (DCT) to...
متن کاملNoise-Robust Speech Features Based on Cepstral Time Coefficients
In this paper, we investigate the noise-robustness of features based on the cepstral time coefficients (CTC). By cepstral time coefficients, we mean the coefficients obtained from applying the discrete cosine transform to the commonly used mel-frequency cepstral coefficients (MFCC). Furthermore, we apply temporal filters used for computing delta and acceleration dynamic features to the CTC, res...
متن کامل